# Study information

The goal of this study was to examine the need for assistance during data analysis for German cancer registries. Many epidemiological cancer registries have been entrusted with answering clinical questions since 2020 or new clinical-epidemiologic cancer registries have been established. In most cases, the analysts of the epidemiological cancer registries also carry out the evaluations to answer the clinical questions. Clinical cancer registry questions focus on the quality assurance of the treatment and have a distinct need to filter the data basis based on diagnosis and treatment information. Epidemiological questions focus on the residence of patients and their diagnosis. So it is questionable if existing tools and approaches are feasible for the clinical questions and analysts of the epidemiological cancer registries might need assistance in performing these types of analyses. The study during which these data were collected aims answer this question.

Analysts of the German cancer registry were recruited for this study and they should perform three exercises with an increasing difficulty that move from epidemiological feature selection to clinical feature selection. The execution was done in German. The participants shall do the exercises as independent as possible while the study supervisor may help. The study supervisor assists and then documents the assistance. The changes of the graphical user interface is collected automatically, so that the study supervisor may review an execution without having to record anything about the participant. Only the state of the graphical user interface of the application is collected.

This repository only contains data about the changes in the graphical user interface, but not the documentation of the assistance of study participants.

# Exercises

## German

### 1

Analysieren Sie die Anzahl und den Anteil der Fälle nach den UICC Stadien 0, I, II, III, IV, OCA und X sowie die Summe der einzelnen Ausprägungen.

### 2

Analysieren Sie den Anteil der Fälle in Bezug auf Diagnosesicherung und Grading.

### 3

Analysieren Sie die Anzahl der Fälle nach lokalem Residualstatus nach Operation R0, R1, R2, Rx, Y und allen lokalen Residualstadien.


## English

### 1

Analyze the number and percentage of cases in relation to UICC stages 0, I, II, III, IV, OCA and X as well as the sum of the individual characteristics.

### 2

Analyze the percentage of cases in relation to diagnostic confirmation and grading.

### 3

Analyze the number of cases in relation to local residual status after surgery R0, R1, R2, Rx, Y and all local residual stages.

# Data set

The data for this study is available in the `Data` folder. Each subfolder corresponds to one participant of the study. The `Reference` and `Reference-DE` folders are the baseline execution of the exercises done by the study supervisor. The `Reference-DE` uses the German localization of the data model whereas all other execution where done with the English localization.

The participant data is structured in three folders: `Base`, `Derived` and `Enriched`. The `Base` folder contains the raw data that was collected during the study. `Derived` contains pre-processed data in order to aggregate the raw data into a usable format. The `Enriched` enhances the `Derived` events and adds additional information, for example a differentiation between a participants explicit selection and the systems implicit selection.

The data types for each event can be found in the `DataPreparation/src/event-types` folder. The folders `Base` contains type definition for all basic events and the folder `Derived` contains all events that have been derived from the base events.

This data set also contains additional data about the underlying data model in the `DataPreparation/metadata.json` file. This file contains all selectable elements of the underlying data model grouped by their function.

# Data Preparation

The scripts that have been used to prepare the data for further analysis is found in the `DataPreparation` folder. A current version of NodeJS is required to execute the script. The execution is divided into two steps: the preparation of the base data to aggregate the base events into logical units (found in the Derived folder) and enhancing information such as logical differentials between two graphical user interface states and user selection/deselection on the analysis model.